Lightweight LCP-Array Construction in Linear Time
نویسندگان
چکیده
The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the lengths of the longest common prefixes of lexicographically adjacent suffixes, and it can be computed in linear time. In this paper, we present new LCP-array construction algorithms that are fast and very space efficient. In practice, our algorithms outperform the currently best algorithms.
منابع مشابه
Fast and Lightweight LCP-Array Construction Algorithms
The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the leng...
متن کاملInducing the LCP-Array
We show how to modify the linear-time construction algorithm for suffix arrays based on induced sorting (Nong et al., DCC’09) such that it computes the array of longest common prefixes (LCP-array) as well. Practical tests show that this outperforms recent LCP-array construction algorithms (Gog and Ohlebusch, ALENEX’11).
متن کاملTwo Space Saving Tricks for Linear Time LCP Array Computation
In this paper we consider the linear time algorithm of Kasai et al. [6] for the computation of the Longest Common Prefix (LCP) array given the text and the suffix array. We show that this algorithm can be implemented without any auxiliary array in addition to the ones required for the input (the text and the suffix array) and the output (the LCP array). Thus, for a text of length n, we reduce t...
متن کاملCritique "Lightweight LCP Construction for Next-Generation Sequencing Datasets"
The paper presents the rst lightweight method that simultaneously computes, the longest common pre x array(LCP) and BWT of very large collections of sequences. Knowing the LCP of DNA sequences collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-e cient algorithms for computing the LCP of a string have been describ...
متن کاملAdvanced topics in algorithms
Lowest common ancestor algorithms are in [12, 19, 2]. Algorithms to construct suffix trees in linear time are in [22, 18, 21, 5]. Suffix arrays were introduced in [17]. The linear time construction algorithm for suffix arrays is from [14]. The simple construction of the LCP array from the suffix array is from [15]. The k-mismatch problem is discussed in [7, 16, 1]. The FM index is from [6]. Som...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1012.4263 شماره
صفحات -
تاریخ انتشار 2010